Skip to content

Use runs-on GPU runners for CI#1439

Open
dfalbel wants to merge 9 commits intomainfrom
runs-on
Open

Use runs-on GPU runners for CI#1439
dfalbel wants to merge 9 commits intomainfrom
runs-on

Conversation

@dfalbel
Copy link
Copy Markdown
Member

@dfalbel dfalbel commented Apr 15, 2026

Summary

  • Replace [self-hosted, gpu-local] runners with runs-on g4dn.xlarge (T4 GPU) instances for the test-gpu and test-cudatoolkit jobs
  • Uses the ubuntu24-gpu-x64 image which includes pre-installed NVIDIA drivers and container toolkit
  • Docker container setup (--gpus all --runtime=nvidia) is preserved

Test plan

  • Verify test-gpu job runs successfully on runs-on runner
  • Verify test-cudatoolkit job runs successfully on runs-on runner
  • Confirm GPU is accessible inside the container (nvidia-smi)

dfalbel added a commit to mlverse/cuda.ml that referenced this pull request Apr 29, 2026
* Use runs-on GPU runners for CI

Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances,
matching the approach in mlverse/torch#1439. Also modernizes the workflow:

- Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc.
- Fix deprecated ::set-output → $GITHUB_OUTPUT
- Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL)
- Add --runtime=nvidia to container options
- Add concurrency groups with cancel-in-progress
- Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release)
- Drop ASAN matrix dimension

* Revert container to ubuntu18.04 for CUDA 11.2 compatibility

* Use CUDA 11.2.2 container (11.2.1 removed from Docker Hub)

* Bump container to ubuntu20.04 (18.04 glibc too old for Node 20 actions)

* Split CI into build-image (free runner) and test-gpu (GPU runner)

- Build Docker image with cuda.ml pre-installed on ubuntu-latest (free)
- Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image
- Add .github/docker/Dockerfile following the same pattern as mlverse/torch
- Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE)
  so cross-compilation works on runners without a GPU (targets T4 = SM 75)
- Remove miniconda install (no longer needed for reticulate tests)

* Fix sklearn install: use scikit-learn package name and py_require()

The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'.
Also switch from py_install() to py_require() which is the modern
reticulate API for declaring Python dependencies.

* Fix configure warnings: normalizePath ordering and cmake unused variable

- Move download_libcuml() before normalizePath() so the directory exists
- Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches
  so cmake doesn't warn about unused variable

* Fix TSVD tests for SVD sign ambiguity between cuML and sklearn

SVD components are only defined up to sign, so different implementations
can produce sign-flipped vectors that are mathematically equivalent.
Align signs before comparing components and transformed data.

* Fix sklearn max_iter type: use integer (10000L) not float (10000.0)

Modern sklearn strictly validates that max_iter is an int. R's default
numeric type is double, which reticulate passes as a Python float.
Using 10000L ensures it's passed as a Python int.

* Add CRAN-like check job (no CUDA, stub headers, ubuntu-latest)

Runs R CMD check --as-cran on ubuntu-latest with R release and devel.
No nvcc/CUDA available, so the package builds with stub headers — matching
what CRAN would see.

* Update roxygen

* export S3 methods

* roxygen updates

* Fix CRAN check: escape Rd braces, skip tests without cuML

- Escape literal braces in roxygen comments across R source files and
  templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\})
- Regenerate all affected Rd files via devtools::document()
- Skip test_check() when cuML is not linked (CRAN-like environments)
- Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true)

* Fix examples brace escaping and register S3 methods

- Revert brace escaping inside @examples blocks (R code, not Rd markup)
- Define cuda_ml_can_predict_class_probabilities methods as proper
  functions so roxygen registers them as S3method() in NAMESPACE

* Add RAPIDS cuML 26.04 + CUDA 12 support

Build infrastructure:
- Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image
- libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel
- cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension)
- configure.R: handle lib64/ vs lib/ for pip wheels
- CMakeLists.txt.in: C++17, rapids-cmake branch-26.04
- Workflow: target cuML 26.04

C++ API changes for cuML 26.04:
- svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for
  KernelParams and KernelType (header renamed kernelparams.h ->
  kernel_params.hpp)
- fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs
  (fil.h replaced by modular headers; full adaptation TODO)
- random_projection.cu: disable on 26.04 with stubs (C++ API removed)
- knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed)
- random_forest_classifier.cu, random_forest_regressor.cu: guard FIL
  prediction paths for 26.04

Backward compatible: cuML 21.x with CUDA 11 still works.

* Test both cuML 21.12 and 26.04 in CI

- Dockerfile: accept CUDA_IMAGE as build arg for different base images
- Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8)
- Each version gets its own build-image and test-gpu job

* Fix rapids-cmake version and lib symlink for dual cuML support

- CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD
  so they adapt to the cuML version being built against
- configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10
  for 21.x) and C++ standard (17 for 26.x, 14 for 21.x)
- cuml.R: don't create premature lib symlink in download_libcuml()

* Derive rapids-cmake tag from cuML version instead of hardcoding

Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older
versions (only alpha tags available).

* Require cmake 3.30.4+ for cuML 26.04 (auto-downloaded if missing)

rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download
logic handles this, but the min version threshold was hardcoded to 3.21.1.
Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions.

* Fix cuML 26.04 build: raft/rmm deps, static_assert, device_allocator

- Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12
  (cuml headers include raft/rmm headers which are in separate packages)
- Merge raft/rmm headers into libcuml/include/ during download
- Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+
- Guard raft::mr::device::allocator (removed in raft 26.x) with version
  conditionals in device_allocator.cu/.h and stream_allocator.cu
- Use raft/core/handle.hpp instead of raft/handle.hpp for v26+

* Resolve cuML PyPI deps dynamically instead of hardcoding URLs

- Add tools/config/utils/pypi.R with resolve_native_deps() that walks
  the PyPI dependency tree for a package and returns download URLs for
  all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.)
- libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12"
  (the PyPI package name), not a hardcoded URL
- cuml.R: download_libcuml() detects PyPI package names vs direct URLs,
  resolves the full dep tree, downloads all wheels, and merges their
  include/ directories into libcuml/include/
- configure.R: load pypi.R utility
- Uses jsonlite for PyPI JSON API parsing

* Download CCCL 3.3 headers for cuML 26.04 builds

RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not
a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x
ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub
releases (header-only, ~2MB) and merge into libcuml/include/.

Also handle pip wheels that extract to nested dirs like
nvidia/<subpackage>/include/.

* Put CUML_INCLUDE_DIR before CUDA toolkit includes

CCCL 3.3 headers (bundled in libcuml/include/) must take precedence
over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order
so cuml/raft/rmm/cccl headers are found first.

* Fix CCCL compat, pinned_allocator removal, and raft handle API

- Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0
  release tag which includes CUDA 13-only code
- pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator
  (removed in CCCL 3.x); use plain host_vector on v26+
- handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct
  with stream_view via constructor on v26+

* Switch to cuML 25.12 (no CCCL 3.x requirement)

cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with
CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in
librmm/include/rapids/ and has no CCCL version check — clean CUDA 12
compatibility.

- Target cuML 25.12 instead of 26.04
- Version guards: >= 26 -> >= 25 (same API changes apply)
- Re-enable KNN (knn.hpp exists in 25.12 with same API)
- Remove CCCL GitHub download (not needed)
- Update PyPI resolver to handle version pins (==25.12.*)

* Define LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE for RMM

RMM headers require this define (normally set automatically by RMM's
cmake config, but we're using headers directly from the pip wheel).

* Revert cuML 25.x/26.x support (CCCL 3.x incompatible with CUDA 12)

All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are
incompatible with CUDA 12's bundled CCCL 2.x. No version of
libcuml-cu12 can be compiled against a stock CUDA 12 toolkit.

Revert to cuML 21.12 as the default for now. Supporting newer cuML
will require either CUDA 13 or a custom build environment.

---------

Co-authored-by: Tomasz Kalinowski <kalinowskit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant